AITopics | evolutionary stochastic gradient descent

Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks

Neural Information Processing SystemsMar-16-2026, 21:27:45 GMT

We propose a population-based Evolutionary Stochastic Gradient Descent (ESGD) framework for optimizing deep neural networks. ESGD combines SGD and gradient-free evolutionary algorithms as complementary algorithms in one framework in which the optimization alternates between the SGD step and evolution step to improve the average fitness of the population. With a back-off strategy in the SGD step and an elitist strategy in the evolution step, it guarantees that the best fitness in the population will never degrade. In addition, individuals in the population optimized with various SGD-based optimizers using distinct hyper-parameters in the SGD step are considered as competing species in a coevolution setting such that the complementarity of the optimizers is also taken into account. The effectiveness of ESGD is demonstrated across multiple applications including speech recognition, image recognition and language modeling, using networks with a variety of deep architectures.

artificial intelligence, machine learning, proceedings, (10 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.65)

Add feedback

Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks

Neural Information Processing SystemsNov-20-2025, 22:18:22 GMT

We propose a population-based Evolutionary Stochastic Gradient Descent (ESGD) framework for optimizing deep neural networks. ESGD combines SGD and gradient-free evolutionary algorithms as complementary algorithms in one framework in which the optimization alternates between the SGD step and evolution step to improve the average fitness of the population. With a back-off strategy in the SGD step and an elitist strategy in the evolution step, it guarantees that the best fitness in the population will never degrade. In addition, individuals in the population optimized with various SGD-based optimizers using distinct hyper-parameters in the SGD step are considered as competing species in a coevolution setting such that the complementarity of the optimizers is also taken into account. The effectiveness of ESGD is demonstrated across multiple applications including speech recognition, image recognition and language modeling, using networks with a variety of deep architectures.

evolutionary stochastic gradient descent, name change, optimization, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.65)

Add feedback

Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks

Xiaodong Cui, Wei Zhang, Zoltán Tüske, Michael Picheny

Neural Information Processing SystemsNov-20-2025, 16:51:55 GMT

The complementarity between SGD and EA is worth investigating.

artificial intelligence, deep learning, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (0.68)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.88)

Add feedback

Reviews: Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks

Neural Information Processing SystemsOct-8-2024, 21:06:19 GMT

The paper combines ES and SGD in a complementary way, not viewing as an alternative to each other. More intuitively, in the course of optimization, the author proposes to use evolutionary strategy to make optimiser adjust at different sections (geometry) of optimisation path. Since the geometry of the loss surface can differ drastically at different locations, it is reasonable to use different gradient based optimisers with different hyper parameters at different locations. Then the main question becomes how to choose a proper optimiser at different locations. The paper proposes ES for that.

esgd hyperparameter, hyperparameter, optimizer, (12 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.85)

Add feedback

Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks

Cui, Xiaodong, Zhang, Wei, Tüske, Zoltán, Picheny, Michael

Neural Information Processing SystemsFeb-14-2020, 17:43:33 GMT

We propose a population-based Evolutionary Stochastic Gradient Descent (ESGD) framework for optimizing deep neural networks. ESGD combines SGD and gradient-free evolutionary algorithms as complementary algorithms in one framework in which the optimization alternates between the SGD step and evolution step to improve the average fitness of the population. With a back-off strategy in the SGD step and an elitist strategy in the evolution step, it guarantees that the best fitness in the population will never degrade. In addition, individuals in the population optimized with various SGD-based optimizers using distinct hyper-parameters in the SGD step are considered as competing species in a coevolution setting such that the complementarity of the optimizers is also taken into account. The effectiveness of ESGD is demonstrated across multiple applications including speech recognition, image recognition and language modeling, using networks with a variety of deep architectures.

deep neural network, evolutionary stochastic gradient descent, optimization, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks

Cui, Xiaodong, Zhang, Wei, Tüske, Zoltán, Picheny, Michael

Neural Information Processing SystemsDec-31-2018

We propose a population-based Evolutionary Stochastic Gradient Descent (ESGD) framework for optimizing deep neural networks. ESGD combines SGD and gradient-free evolutionary algorithms as complementary algorithms in one framework in which the optimization alternates between the SGD step and evolution step to improve the average fitness of the population. With a back-off strategy in the SGD step and an elitist strategy in the evolution step, it guarantees that the best fitness in the population will never degrade. In addition, individuals in the population optimized with various SGD-based optimizers using distinct hyper-parameters in the SGD step are considered as competing species in a coevolution setting such that the complementarity of the optimizers is also taken into account. The effectiveness of ESGD is demonstrated across multiple applications including speech recognition, image recognition and language modeling, using networks with a variety of deep architectures.

artificial intelligence, deep learning, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America (0.28)

Genre: Research Report (0.68)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks

Cui, Xiaodong, Zhang, Wei, Tüske, Zoltán, Picheny, Michael

Neural Information Processing SystemsDec-31-2018

We propose a population-based Evolutionary Stochastic Gradient Descent (ESGD) framework for optimizing deep neural networks. ESGD combines SGD and gradient-free evolutionary algorithms as complementary algorithms in one framework in which the optimization alternates between the SGD step and evolution step to improve the average fitness of the population. With a back-off strategy in the SGD step and an elitist strategy in the evolution step, it guarantees that the best fitness in the population will never degrade. In addition, individuals in the population optimized with various SGD-based optimizers using distinct hyper-parameters in the SGD step are considered as competing species in a coevolution setting such that the complementarity of the optimizers is also taken into account. The effectiveness of ESGD is demonstrated across multiple applications including speech recognition, image recognition and language modeling, using networks with a variety of deep architectures.

artificial intelligence, deep learning, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America (0.28)

Genre: Research Report (0.68)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks

Cui, Xiaodong, Zhang, Wei, Tüske, Zoltán, Picheny, Michael

arXiv.org Machine LearningOct-15-2018

We propose a population-based Evolutionary Stochastic Gradient Descent (ESGD) framework for optimizing deep neural networks. ESGD combines SGD and gradient-free evolutionary algorithms as complementary algorithms in one framework in which the optimization alternates between the SGD step and evolution step to improve the average fitness of the population. With a back-off strategy in the SGD step and an elitist strategy in the evolution step, it guarantees that the best fitness in the population will never degrade. In addition, individuals in the population optimized with various SGD-based optimizers using distinct hyper-parameters in the SGD step are considered as competing species in a coevolution setting such that the complementarity of the optimizers is also taken into account. The effectiveness of ESGD is demonstrated across multiple applications including speech recognition, image recognition and language modeling, using networks with a variety of deep architectures.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Machine Learning

1810.06773

Country: North America (0.46)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Filters

Collaborating Authors

evolutionary stochastic gradient descent

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks

Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks

Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks

Reviews: Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks

Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks

Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks

Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks

Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks